## Intro to Processor Architecture – End Semester Exam (Spring 2023)

Max Time: 90 minutes

Max Marks: 60

Instructions: Please give only the required points to answer the question. You need not write an essay. On average, you should spend 7.5 mins per question.

- Write an assembly level program (use Y86-64 instruction set) to compute the factorial of a number (te a location in memory. Please give a compact code and the result must be stored back into another memory location. (5 marks)
- 2) Answer the following.
  - a) What is a programmer visible state? (1.5 marks)
  - b) Please list what constitutes the programmer visible state for a Y86-64 program. (3.5 marks)
- 3 A large program with 6 million instructions needs to be executed in a processor with frequency of 2 GHz.
  - a) If the first 30% of the instructions run in 3 clock cycles, the second 30% instructions run in 5 clock cycles and the remaining 40% instructions run in 8 clock cycles, what is the execution time of the program? (2.5 marks)
  - b) If in the above scenario 3(a), the first 40% of the instruction executions can be optimized to execute in 2 clock cycles, what is the change in the execution time of the program? (2.5 marks)
- 4) Please answer the following.
  - (a) What is a control hazard? (1 marks)
  - (b) Briefly explain how you will solve the two control hazards in a 5-stage pipeline architecture. (4 marks)
- 5) For a 5-stage pipeline consisting of fetch, decode, execute, memory and writeback stages, please answer the following.
  - (a) What is the issue encountered with the following instructions? (2.5 marks) mrmovq (%rdx),%rbx
  - (b) What will happen to the above issue if you change the pipeline architecture to consist of only 4 stages fetch, decode, execute+memory, writeback? Please explain briefly. (2.5 marks)

- 6) Please explain how the push and pop instructions work by describing what actions are taken in each of the 5 stages in the pipelined architecture. (5 marks)
- Suppose there is a processor architecture with an arbitrary number of pipeline stages n and each pipeline stage has a delay of 400/n, please answer the following. (Assume the pipeline register delay is 30 ps)
  - (a) What is the latency and throughput of the system in terms of n? (2 marks)
  - (b) What is the limit on throughput that can be achieved with increase in pipeline stages? (3 marks)
- 8) If a processor is designed in such a manner that all the stages execute within one clock cycle of the processor, then please list what are the issues with this kind of architecture. (5 marks)
- 9) Suppose there is a 5-stage pipelined architecture consisting of fetch, decode, execute, memory and writeback. For the following instruction sequences, will any problem be encountered? Please explain with diagram. If there is a problem, what is the best way to solve it? (5 marks)
  - (a) irmovq \$8,%rbx irmovq \$7,%rcx nop addq %rbx,%rcx
  - (b) irmovq \$8,%rbx irmovq \$7,%rcx addq %rbx,%rcx
- 10) What is a page table? Please explain how it is relevant to accessing data from physical memory? (5 marks)
- 11) Please answer the following questions.
  - (a) What is the size of the page table required if virtual address has 48 bits, page table entry is 4-byte and page size is 4 KB. (2.5 marks)
  - (b) Please comment on the page table size that you obtain. (2.5 marks)
- 12) Please list all the blocks where the principle of locality is exploited and how it works in making virtual memory system perform well. (5 marks)